# High-precision Visual Model
Vit Large Patch16 Siglip Gap 384.webli
Apache-2.0
A vision Transformer model based on SigLIP, utilizing global average pooling, suitable for image feature extraction tasks.
Image Classification
Transformers

V
timm
13
0
Resnet50x4 Clip.openai
MIT
ResNet50x4 vision-language model based on CLIP architecture, supporting zero-shot image classification tasks
Image-to-Text
R
timm
2,303
0
Featured Recommended AI Models